Texts and Social Users Using Time Series and Latent Topics

نویسندگان

  • Tao Yang
  • Dongwon Lee
  • Prasenjit Mitra
  • Bruce G. Lindsay
چکیده

Knowledge discovery has received tremendous interests and fast developments in both text mining and social user mining. The main purpose is to search massive volumes of data for patterns as so-called knowledge. Knowledge can exist in different formats such as texts or numbers. Knowledge can be observed or hidden in different hierarchies. Knowledge can even be user-generated such as social content and social activity in Web 2.0 era. In this dissertation, we study a series of new knowledge discovery techniques using four data mining applications. First, we propose our novel framework on mining text databases using time series by bridging two seemly unrelated domains alphabets strings and numerical signals. We study how various transformation methods affect the accuracy and performance of detecting near-duplicate texts in record linkage. Second, we develop new topic models on mining text documents using latent topics to tackle the noisy data problem in document modeling. We show how the incorporation of textual errors and topic dependency into the generative process affect the generalization performance of topic models. Third, we introduce our novel methods in mining social content using time series to classify user interests. We show the accuracy of our approach in both binary and multi-class classification of sports and political interests of social users. Finally, we introduce our generative modeling approach in mining social activity using latent topics to predict user attributes. We show the performance of our methods in predicting binary and multi-class demographical attributes of social users.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Identification of Latent Communities on Twitter

User communities in social networks are usually identified by considering explicit structural social connections between users. While such communities can reveal important information about their members such as family or friendship ties and geographical proximity, they do not necessarily succeed at pulling like-minded users that share the same interests together. In this paper, we are interest...

متن کامل

Using Linear Dynamical Topic Model for Inferring Temporal Social Correlation in Latent Space

The abundance of online user data has led to a surge of interests in understanding the dynamics of social relationships using computational methods. Utilizing users’ items adoption data, we develop a new method to compute the Granger-causal (GC) relationships among users. In order to handle the high dimensional and sparse nature of the adoption data, we propose to model the relationships among ...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Analysis and Prediction of Question Topic Popularity in Community Q&A Sites: A Case Study of Quora

In the past few years, Quora a community-driven social platform for question and answering, has grown exponentially from a small community of users into one of the largest and reliable source of Q&A on the Internet. Quora has a built-in social structure integrated to its backbone; users can follow each other, follow question, topics etc. Apart from the social connections that Quora provides, it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014